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Abstract 

Background: Bovine tuberculosis (bTB) is a disease with major implications for animal welfare and productivity, as 
well as having the potential for zoonotic transmission. In Great Britain (GB) alone, controlling bTB costs in the 
region of £100 million annually, with the current control scheme seemingly unable to stop the inexorable spread of 
infection. One aspect that may be driving the epidemic is evolution of the causative pathogen, Mycobacterium 
bovis. To understand the underlying genetic changes that may be responsible for this evolution, we performed a 
comprehensive genome-level analyses of 4 M. bovis strains that encompass the main molecular types of the 
pathogen circulating in GB. 

Results: We have used a combination of genome sequencing, transcriptome analyses, and recombinant DNA 
technology to define genetic differences across the major M. bovis lineages circulating in GB that may give rise to 
phenotypic differences of practical importance. The genomes of three M. bovis field isolates were sequenced using 
lllumina sequencing technology and strain specific differences in gene expression were measured during in vitro 
growth and in ex vivo bovine alveolar macrophages using a whole genome amplicon microarray and a whole 
genome tiled oligonucleotide microarray. SNP/small base pair insertion and deletions and gene expression data 
were overlaid onto the genomic sequence of the fully sequenced strain of M bovis 2122/97 to link observed strain 
specific genomic differences with differences in RNA expression. 

Conclusions: We show that while these strains show extensive similarities in their genetic make-up and gene 
expression profiles, they exhibit distinct expression of a subset of genes. We provide genomic, transcriptomic and 
functional data to show that synonymous point mutations (sSNPs) on the coding strand can lead to the expression 
of antisense transcripts on the opposing strand, a finding with implications for how we define a 'silent' nucleotide 
change. Furthermore, we show that transcriptomic data based solely on amplicon arrays can generate spurious 
results in terms of gene expression profiles due to hybridisation of antisense transcripts. Overall our data suggest 
that subtle genetic differences, such as sSNPS, may have important consequences for gene expression and 
subsequent phenotype. 

Keywords: Bovine tuberculosis, Mycobacterium bovis, Microarray, Transcript, SNP, Antisense, Macrophage 



^BMC 

Genomics 



* Correspondence: paul.golby@ahvla.gsi.gov.uk 

^Animal Health and Veterinary Laboratories Agency, Woodham Lane, New 
Haw Addlestone, Surrey ^15 3NB, UK 

Full list of author information is available at the end of the article 

O© 2013 Golby et a!.; licensee BioMed Central Ltd. This is an open access article distributed under the terms of the Creative 
BIoIVIGCI CGntrsI commons Attribution License (http://creativecommons.Org/licenses/by/2.0), which permits unrestricted use, distribution, and 
reproduction in any medium, provided the original work is properly cited. 



Golby et at. BMC Genomics 2013, 14:710 
http://www.bionnedcentral.conn/1471 -21 64/1 4/710 



Page 2 of 18 



Background 

Mycobacterium bovis is the causative agent of bovine 
tuberculosis (bTB), an endemic disease of cattle in 
Great Britain (GB) with the potential for zoonotic trans- 
mission to humans. In GB the primary control of bTB is 
through 'test and slaughter' surveillance, whereby cattle 
that are positive to the tuberculin skin test [1] are 
removed from the herd and slaughtered. In spite of this 
approach, which has been in place since the 1950s, the 
number of TB-positive cattle slaughtered is increasing 
year on year - approximately 30,000 cattle were tested and 
slaughtered between 2012-2013, compared to 300 between 
1995-1996 (http://www.defra.gov.uk/animal-diseases/a-z/ 
bovine-tb/). The UK (GB and Northern Ireland) govern- 
ments currently spend approximately £100 million per 
year collectively on control measures and compensation 
to farmers for slaughtered cattle. The failure of the test- 
and-slaughter policy to control the spread of infection 
in large parts of GB suggests that we need a much 
greater understanding of the TB disease dynamic, includ- 
ing the role of pathogen diversity as a potential driver of 
this process. 

M bovis isolates that are cultured from skin test-reactor 
animals are currently genetically typed using a combination 
of spoligotyping [2] and VNTR [3] . Spoligotyping exploits 
a polymorphic region of the genome called the DR locus 
which consists of multiple, identical 36bp repeats inter- 
spersed with unique sequences known as spacers. Isolates 
of M, bovis differ in the presence or absence of spacers 
and adjacent DRs, allowing a 'barcode' to be generated for 
each molecular type. Spoligotypes 9 and 17 are the domin- 
ant molecular types in the UK, with more than one third 
of all isolates corresponding to Type 9 and a quarter to 
Type 17. VNTR measures the variation at repeat se- 
quences in 6 regions of the genome. There are 6 major 
VNTR types for Type 9, while all others show only one 
dominant profile, suggesting that M, bovis Type 9 
strains are more genetically variable compared with 
other spoligotypes. Integration of molecular typing 
with geographical information systems allows temporal 
and spatial distribution of molecular types to be 
mapped across GB. Type 9 isolates are widely distrib- 
uted across GB, while type 17 is an emerging clone 
which has expanded out of foci around Gloucester, 
Hereford and Worcester. Similarly, Types 25 and 35 
have expanded out of Staffordshire/Shropshire and 
Hereford/Worcester, respectively. Between them, types 
25, 35, 9 and 17 encompass the diversity of the major 
clonal lineages of M, bovis circulating in the UK. 

An analysis of molecular typing data from ~ 11,500 
M, bovis isolates revealed that the population structure 
of M, bovis in GB could not be explained by random 
mutation and drift and instead, it appeared that certain 
strains were increasing at a faster rate relative to others 



[4]. One suggestion for the clonal expansion' of GB 
M, bovis genotypes was that certain genotypes had a 
selective advantage over others leading to an increase 
in their frequency in the population [4]. Supportive of 
this hypothesis, several lines of evidence have suggested 
that M, bovis isolates show phenotypic differences to 
each other. Fourier-Transform Infrared Spectroscopy 
(FT-IR) has been used to generate metabolic profiles of 
the 10 major spoligotype groups of M, bovis isolates 
circulating in GB. Clustering analysis of the resulting 
spectra showed that the spectra could be differentiated 
according to spoligotype, indicating that strains of dif- 
ferent spoligotypes possess phenotypically distinct traits 
[5]. In addition, it has also been shown that type 17 iso- 
lates have lower incorporation rates of propionate into 
membrane lipid components compared to other field 
strains, suggesting a degree of metabolic remodelling in 
the type 17 lineage [6]. Hence it appears that genetic 
differences across M, bovis lineages may impact on 
phenotypic traits. This latter finding may have import- 
ant implications for vaccine and diagnostic test develop- 
ment, in terms of which experimental challenge strains 
to test vaccines against or on influencing diagnostic test 
performance. 

In an attempt to better define genetic differences 
across the major M. bovis lineages circulating in GB 
that may give rise to phenotypic differences of practical 
importance, we have used a combination of genome 
sequencing, transcriptome analyses, and recombinant 
DNA technology. The genomes of three M. bovis field 
isolates were sequenced using Illumina sequencing 
technology and strain specific differences in gene ex- 
pression were measured during in vitro growth and in 
ex vivo bovine alveolar macrophages (M(|)) using a 
whole genome amplicon microarray. Recent discover- 
ies of small non coding RNA within mycobacteria [7,8] 
prompted us to assess differences in sRNA expression 
across the isolates using a whole genome tiled oligo- 
nucleotide microarray. SNP/small base pair insertion 
and deletions (INDELs) and gene expression data were 
overlaid onto the genomic sequence of the fully se- 
quenced strain of M, bovis 2122/97 to Unk observed 
strain specific genomic differences with differences in 
RNA expression. 

Results 

Comparative genomics of M. bovis field isolates using 
whole genome sequencing and microarrays 

The strains for this study were chosen to reflect the 
genomic diversity of the M, bovis population circulating 
in GB, and are listed in Table 1. M, bovis strains were 
typed using a combination of spoligotyping and VNTR. 
For each spoligotype group, an isolate which possessed 
the most common VNTR profile was selected, so that 
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Table 1 Nl. bovis field strains used in this study 



AHVLA ID 


International spoligotype ID [9] 


UK spoligotype ID 


VNTR ID 


Year of isolation 


UK county of isolation 


1121/01 


SB0263 


17 


7555*33.1 


2001 


Wiltshire 


2122/97 


SB0140 


9 


8555*33.1 


1997 


Devon 


2451/01 


SB0129 


25 


6554*23.1 


2001 


Clwyd 


1307/01 


SB0134 


35 


3354*33.1 


2001 


Shropshire 



Indicates partial number of repeats. 



each chosen strain was the most representative of each 
spoUgotype group (Table 1). Of the four studied strains, 
2451/01 and 1307/01 diverged earliest during descent 
from the most recent common ancestor of M. bovis in 
GB and are more distant to strains 1121/01 and 1307/ 
01 (Smith, N. personal communication). All 4 strains 
were isolated from diseased cows belonging to herds 
which were taken from farms in geographically diverse 
areas of the country. 

The genomes of the three M. bovis strains 1121/01, 
2451/01 and 1307/01 were paired-end sequenced using 
Illumina sequencing technology. Processed sequence 
reads were mapped to the genome of the fully sequenced 



and annotated strain 2122/97 [10] to identif)^ SNPs. 
SNPs were identified across all four sequenced M. bovis 
strains, and their positions, together with their SNP 
class, are listed in Additional file 1. The evolutionary 
relationships between the three sequenced strains are 
depicted in Figure 1 using a phylogenetic tree and a 
distance matrix plot. Three genome sequenced members 
of the Mycobacterium tuberculosis complex, M, bovis 
BCG Pasteur, M, tuberculosis H37Rv and M, africanum 
GM041182, were included in the analysis to place the 
M, bovis strains in a wider mycobacterial context. The 
numbers of SNPs between M. bovis 2122/97 and the 
three newly sequenced M, bovis strains were found 
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0 
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717 


2280 


114 


431 


552 


M. africanum 


1887 


0 


1897 


2177 


1878 


1847 


1834 


Mb BCG 


717 


1897 


0 


2290 


710 


679 


668 


Mtb H37RV 


2280 


2177 


2290 


0 


2273 


2240 


2227 


Mb 1121/01 


114 


1878 


710 


2273 


0 


421 


542 


Mb 2451/01 


431 


1847 


679 


2240 


421 


0 


505 


Mb 1307/01 


552 


1834 


668 


2227 


542 


505 


0 



Figure 1 Whole genome SNP-based evolutionary analysis of M. bovis sequenced strains, (a). Phylogenetic tree with numbers above the 
branches indicating the number of SNPs identified between the organism and its common ancestor, (b) Distance matrix plot showing the 
number of SNPs present between selected pairs of strains. 

V ) 
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to be consistent with their predicted evolutionary distances 
from each other. Strain 1121/01 (type 17) is most closely 
related to the original genome sequenced strain 2122/97 
(type 9) with 114 SNPs, whereas the more distantly related 
strains 2451/01 (type 25) and 1307/01 (type 35) have 431 
and 552 SNP differences respectively. All four sequenced 
M, bovis strains were more distantly related to M, 
africanum (-1800 SNPs) and M, tuberculosis H37Rv 
(--2200 SNPs). 

In addition to SNPs, both large and small INDELs can 
be inferred from NGS data, although there are several 
challenges involved [11]. Large sized insertions (>500 
bp) are particularly difficult to identify with accuracy as 
they require the de novo assembly of the reads that fail 
to map to the reference genome and the subsequent 
database searching with the resulting contigs. Accurate 
identification of large sized deletions (>500 bp) are, how- 
ever, more easily identifiable and Additional file 2 lists 
those that have been identified by NGS in the genomes 
of the three sequenced strains. As poor coverage can 
confound the identification of deletions, we sought to 
use microarray technology to confirm the deletions iden- 
tified by the NGS data. Genomic DNA was isolated from 
all four strains, labelled with fluorescent dyes and 
hybridised to a whole genome M, tuberculosis I M, bovis 
amplicon microarray (see Methods). Table 2 lists several 
LSPs that were detected across the strains using 
microarrays. Both NGS and microarray data predict the 
presence of the large 6.8kb deletion (RDbovis(d)_0173) 
which encompasses genes Mb 1963-Mb 1971 and appears 
to be specific to UK strains belonging to Type 17 that 
has been described in a previous study [12]. Several of 
these gene products are predicted to encode proteins in- 
volved in lipid metabolism, but the lipid composition of 
several type 17 isolates was found to be no different to 
other M, bovis strains, although their ability to incorpor- 
ate propionate into mycolic acids was found to be lower 
[6]. A smaller 1.6 kb deletion specific 1307/01 that com- 
prises the 3' end of Mb2056c, Mb2055c and the 5' end 



of p]<fB (Mb2054c) was also detected by both NGS and 
microarray data. Due to a single base deletion, Mb2056c 
and Mb2055c are pseudogenes in 2122/97, but the two 
genes exist as one intact functional gene in 2145 and 
H37Rv (Rv2030c). The pfl<B gene encodes a phospho- 
fructokinase homologue and is strongly immunogenic in 
human TB patients, while Rv2030c encodes an erythro- 
mycin esterase. Both pf<B and Mb2056c are members of 
the DosR regulon, which are highly upregulated under 
anaerobic conditions and have been implicated in bac- 
terial persistence in vivo [13]. Other smaller deletions 
detected include a deletion of a probable lipid transfer 
protein encoding gene Mb 1699c, which is specific to 
1307/01, and an aldo/keto reductase encoding gene, 
Mb2320 that is specific to 1121/01. 



Linking SNPs to genes that show differential expression 
amongst M bovis strains grown under vitro conditions 
and in ex vivo macrophages 

The four M. bovis field strains were grown to mid- 
logarithmic phase in pyruvate-containing Middlebrook 
7H9 liquid media, and then used to infect bovine alveo- 
lar M(j) using a multiplicity of infection (MOI) of 10:1 
(bacilli: Mcj)). Mycobacterial RNA was recovered from 
infected host cells 4 and 24 hrs post infection using a 
differential lysis procedure and amplified using a modi- 
fied procedure similar to that described by van Gelder 
et al. ([14]; see Methods). As a control, RNAs were also 
extracted from strains that had been incubated statically 
in RPMI cell culture media for a period of 4 hrs. To 
eliminate potential skewing effects on the transcriptome 
resulting from the amplification process, comparisons 
were made only between amplified RNA generated from 
samples collected at the same time point and biological 
replicate. For the in vitro growth condition, total RNAs 
were extracted from the four strains grown in a pyruvate- 
containing Middlebrook 7H9 liquid media and rolled 
during incubation. 



Table 2 Large sequence polymorphisms present in M. bovis field Isolates 



Mb CDS 


Mtb CDS 


Genomic postn. 
(wrt to 2122/97) 


Size of 
deletion (bp) 


Gene(s) 


2122/97 


1121/01 


2451/01 


1307/01 


Product(s) 


Mbl699c 


Rvl627c 


n/d 


n/d 










DEL 


Probable lipid transfer protein 


'Mbl963c- 
Mbl971 


Rvl928c- 
Rvl936 


2172231-2179038 


6807 


tpx, fodE17, 
fodElS, echAlS 




DEL 






Gene products have roles in 
lipid metabolism 


Mb2054c 


Rv2029c 


2259798-2261457 


1659 


pfkB 








DEL 


Possible phosphofructokinase 


Mb2055c/ 
Mb2056c 


Rv2030c 














DEL 


Conserved hypothetical protein 
(frame-shifted in 2212/97) 


Mb2320 


Rv2298 


n/d 


n/d 




DEL 


DEL 






Aldo/keto reductase 


Mb3923c 


Rv3894c 


n/d 


n/d 








DEL 




Conserved membrane protein 
(frame-shifted in 2212/97) 



DEL indicates genes that are deleted from strains. 
^Mostowy et al., 2005; n/d, not determined. 
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The RNAs extracted from each of the growth condi- 
tions were converted to cyanine labelled cDNA using re- 
verse transcriptase and hybridised to whole genome 
amplicon microarrays. Using only those genes that are 
common to all four strains, we found a total set of 70 
genes that showed a 2.5-fold or more difference in ex- 
pression in one or more strains when pairwise compari- 
sons were made between the transcrip tomes of 2122/97 
and 1121/01, 2451/01 or 1307/01 (Additional file 3). A 
subset of these 70 genes is shown in Table 3 where key 
examples of alterations in metabolic processes are 
shown. The numbers of genes that were found to show 
differential expression across the four strains reflected 
the evolutionary distances between 2122/97 and the 
other 3 strains. Thus, 1121/01, which is closely related 
to 2122/97, shows only 5 differentially expressed genes, 
while the most distantly related strain 1307/01 shows 56 
genes differently expressed compared to 2122/97. Of 
these 56 genes, 5 were specific to the in vitro condition, 
while 19 were specific to the M(|). Ten genes were com- 
mon to both conditions, which serves to validate the 
technical reproducibility of the RNA amplification 
process. 

Using the genome sequencing information determined 
for each of the four strains, we attempted to correlate 
the observed strain-specific differences in gene expres- 
sion with the presence or absence of mutations within 
the coding regions or promoters of those genes that 
show differential gene expression, or in genes that are 
known to regulate the activity of those genes. Mb 1749c 
and Mb 1750c are two genes that are specifically 
upregulated in 1307/01 and encode a toxin and antitoxin 
(TA) pair, respectively, belonging to type II TA systems 
of the VapBC family [15]. Members of VapB type toxins 
contain PIN domains that cleave RNA and thus function 
to control translation of mRNA transcripts [16]. The 
homologous genes from strain 1307/01 show up to 19- 
and 10-fold higher levels of expression, respectively, than 
those of the other three strains. An analysis of the cod- 
ing sequences of Mb 1749c across all four strains re- 
vealed that the 1307/01 homologue has a unique nSNP 
at position 1932704 (wrt 2122/97 genomic sequence), a 
C-T transition that results in the nonconservative substi- 
tution of Glyl9 to Asp. Research has shown that TA 
gene pairs negatively regulate their own expression 
through binding of the TA protein complex to the pro- 
moter region of the TA gene pair, thus preventing access 
to RNA polymerase [17]. The G19D mutation in 
Mb 1749c could therefore impair the ability of the com- 
plex to bind to the promoter resulting in the deregula- 
tion of the TA gene pair. 

Mb2007c, which shows a 4-fold higher expression in 
1307/01 only, encodes a transcriptional regulator of the 
LysR class. There are two SNPs present in the coding 



sequence of Mb2007c in 1307/01 which are absent in 
the homologues of the other three strains: the first is a 
nSNP which results in the conservative substitution of 
Argl37 to Gin, while the second is a more debilitating 
nonsense SNP, which ultimately leads to a protein whose 
length is only 60% that of the wild-type. Many regulators 
belonging to the LysR family regulate their own expres- 
sion through a negative autoregulatory mechanism simi- 
lar to that described above for VapBC TA systems [18]. 
A loss in protein integrity could, therefore, result in the 
regulator being unable to bind the regulatory region, 
leading to the observed upregulation in the expression 
of this gene in 1307/01. As the product of this gene is 
predicted to be a transcriptional regulator, it was specu- 
lated that the regulation of gene(s) controlled by regula- 
tor could be affected in 1307/01 due to the severely 
truncated form of this protein. As LysR regulators are 
often found to regulate genes that are divergently tran- 
scribed from the lysR gene, it was surprisingly to find 
that expression of the Mb2008 homologue in 1307/01, 
which is predicted to encode a lysine transporter, does 
not show any difference in expression in 1307/01 to 
2122/97. To define the regulon of this regulator, we first 
compared the transcriptomes of 2122/97 transformed 
with a multicopy plasmid expressing the truncated copy 
of mb2007c against a vector only control. No differences 
in expression were found (data not shown), which could 
indicate that the regulator does not control any other 
genes apart from itself, or that experimental conditions 
did not favour the active form of the regulator. LysR reg- 
ulators regulate expression of their regulon through 
binding of a co-inducer to the C-terminal domain, and 
the failure to observe any changes in gene expression 
could therefore be due to the absence of the co-inducer 
during the experiment. A further experiment to compare 
the profiles of 2122/97 expressing either the truncated 
or wild type forms of the protein also showed no differ- 
ences in expression (data not shown). 

Nitrite reductase catalyses the reduction of nitrite to 
ammonia and is strongly expressed during growth in the 
presence of nitrate or nitrite, but repressed in the pres- 
ence of ammonia [19]. The gene encoding the large sub- 
unit of the nitrite reductase, nirB (Mb0258), shows 
approximately 9-fold higher expression in 1307/01 com- 
pared to the other 3 strains in our standard ammonia 
containing 7H9 growth media, suggesting that the strain 
has lost regulatory control of this gene. Expression of 
nirB in M, tuberculosis has been shown to be controlled 
by the response regulator GlnR [20], but an analysis of 
the sequence of the glnR orthologue from all four strains 
revealed no differences in either the coding or promoter 
sequences. A comparison of the nirB sequence from all 
4 strains did, however, reveal the presence of a single 
base (C to T) transition leading to a sSNP that is specific 



Table 3 Fold change differences In gene expression In M. bovis field Isolates 1121, 2451 and 1307 compared to 2122 



Mb CDS Mtb CDS Common 



1121/01 



2451/01 



1307/01 



Product 



Assoc. SNP/lnDel 



Mb0038c Rv0037c 



Mb0124c 
Mb0258 
Mb0259 
Mb0428c 



MblOlS 

Mbll61 

Mbll62 

Mbl554c 

Mbl562 
Mbl619c 

Mb 1749c 

Mbl750c 

Mbl833c 

Mb 1834c 



7h9 RPMI 4hrM0 



Rv0120c fusA2 

Rv0252 nirB 

Rv0253 nirD 
Rv0420c 



Mb0734 Rv0713 

Mb0947c Rv0923c 

Mb0948c Rv0924c mntH 

Mbl013; Rv0987 
Mbl014 



Rv0988 
Rvll30 

Rvll31 gItAl 

Rvl527c pks5 

Rvl535 
Rvl593c 

Rv 1720c 

Rvl721c 

Rvl804 

Rvl805c 



24hrM0 7h9 RPMI 
4t 

3t 



2i 



2t 



4hr M0 
3t 



2t 



21 



5t lot 



3t 9t 



3i 



5t 



2t 



24hrM0 7h9 
3t 

3t 
9t 



RPMI 4hr M0 24 hr M0 



probable conserved 
integral membrane 
protein 

probable elongation 
factor 

12t 21 1 9t nitrite reductase 

(large subunit) 

3t 4t 5t nitrite reductase 

(small subunit) 

2t possible 

transmembrane 
protein 

probable conserved 
transmembrane 
protein 

21 conserved 

hypotheitical protein 

cation uptake system 

5t 7| 6t probable adhesion 

component transport 
ABC transporter 

3t 5t 5t possible conserved 

exported protein 

2| conserved 

hypothetical protein 

probable citrate 
synthase 

probable polyketide 
synthase 

5t hypothetical protein 

conserved 
hypothetical protein 

19t 18t 12t conserved 

hypothetical protein 

3t 9t 5t conserved 

hypothetical protein 

3| conserved 

hypothetical protein 

4t hypothetical protein 



lot 

6t 



sSNP at postn. 303227 in 1307/01 
only; C-T; 



nSNP at postn. 1 104263 in 2451/01 and 

1 307/01 ; A-G (STOP to W); sSNP at 
postn 1103991 in 2451/01 and 1307/01 



nSNP at postn. 1932704 in 
1307/01 only; C-T (G to D) 



Table 3 Fold change differences In gene expression In M. bovis field Isolates 1121, 2451 and 1307 compared to 2122 (Continued) 



Mbl835 Rvl806 PE20 

Mbl885c Rvl854c ndh 

Mbl914c Rvl882c 

Mb2007c Rvl985c 

Mb2015c RV1992C ctpG 

Mb2420c Rv2398c cysW 

Mb2421c Rv2399c cysT 

Mb2607 Rv2577 

Mb3194 Rv3169 

Mb3477c Rv3447c 

Mb3563c Rv3533c PPE62 



Mb3721c Rv3696c 



gIpK 



Mb3803 Rv3774 echA21 



3t PE family protein 

3| probable nadh 

dehydrogenase 

6t 3t 6t probable short-chain 

type dehydrogenase/ 
reductase 

4t 3t 3t probable 

transcriptional 
regulatory protein 

2| 3| probable cation 

transporter 

21 61 sulphate transporter 

21 41 sulphate transporter 

91 51 71 ^2i lOi 71 71 13i conserved 

hypothetical protein 
[first part] 

3t 3t 21 3t conserved 

hypothetical protein 

81 31 ^0i 91 21 71 probable conserved 

membrane protein 

5t 5t ppe family protein 

3| glycerol kinase 

8t 7t 22t possible enoyl-coA 

hydratase 



sSNP at postn 2122970 in 
2415/01 only:C-T 

nSNP at postn. 2208296 in 1307/01 
only; G-A (Q to Stop) 



nSNP at postn. 3812465 in 2451/01 
and 1 307/01 ;T-C (S to G) 



sSNP at postn. 4155803 in 
2415/01 only; G-A 



Up and down arrows indicate fold up- and downregulation, respectively, and empty cells indicate no change in expression. 
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to 1307/01. It was not readily apparent why a sSNP in the 
coding sequence of a gene should lead to an increase in 
expression of that gene, but there are several reports that 
show sSNPs leading to changes in stability of mRNA tran- 
scripts [21,22]. Rv0987 and Rv0988 of M tuberculosis 
H37Rv encode part of an ABC transporter and a putative 
secreted hydrolase, respectively. In 2122/97, a single base 
transition (G-A) introduces a stop codon that splits 
Rv0987 into the two pseudogenes, MblOlS and Mbl014. 
Previous microarray based gene expression studies by our 
group have shown that Rv0987 and Rv0988 in M, tubercu- 
losis show higher levels of expression than the orthologous 
Mbl013/Mbl014 and MblOlS, respectively, in M bovis 
2122/97 [23], and in the present study the Mbl013/ 
Mbl014 and MblOlS homologues in 2451/01 and 1307/ 
01 also showed higher expression (up to 10-fold) than the 
homologues in 2122/97 and 1121/01. Comparing the 
sequences of Mbl013/Mbl014 and MblOlS across all 4 
strains indicated that strains that show high expression 
have the 'G' allele. 

Mb3477c encodes an ATP binding membrane protein, 
part of the Esx4 secretion system [24], and gene shows up 
to 10-fold higher expression in 2451/01 and 1307/01 com- 
pared to 2122/97. The gene also contains an A to C transi- 
tion at position 3812465, a nSNP at position resulting in 
the non-conservative substitution of a serine to a glycine 
residue. 



Of the 19 genes that show specific differential expres- 
sion in the M(|), the most notable are Mb 19 14c and 
echA21, which show upregulation in 2451/01 only (up 
to 6- and 23-fold, respectively). Both genes encode 
proteins that could be involved in lipid metabolism, and 
both genes contain single sSNPs that are present in 
2451/01, but absent in the other three strains. 

Real time RT-PCR was used to verify a selection of 
genes that showed differential gene expression as pre- 
dicted by the microarray analysis. Figure 2 compares 
the fold changes in the expression levels of 4 genes as 
measured by microarray and real time RT-PCR. The 
nirB and Mb 1749c genes were selected because they 
showed strong upregulation in 1307/01 in both in vitro 
and ex vivo M(|) while Mbl914c and echA21 were chosen 
because the array data predicted them to be specifically 
upregulated in 2451/01 and only in ex vivo M(|). For 
each of the 4 genes, the strain dependent pattern of 
expression as measured by real time RT-PCR was consistent 
with that measured by microarray, although the fold 
changes measured by real time RT-PCR were higher 
than those measured by microarray. 

Functional analysis of SNP role in differential gene 
expression 

The above data showed that many of the strain specific 
differences in gene expression were linked to the presence 
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Figure 2 Confirmation of amplicon microarray results witii real time RT-PCR. The fold changes in gene expression for (a) Mbl750c, (b) nirB, 
(c) echA21 and (d) Mbl914c measured by microarray (open bars) in each of the four strains were compared to that measured by real time RT-PCR 
(closed bars). 
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of synonymous or non-synonymous SNPs located within 
the coding regions of the genes that show variable expres- 
sion. Non- synonymous SNPs lead to changes in amino 
acid sequence which can lead to changes in protein func- 
tion. The C to T transition at position 1932704 (wrt 2122/ 
97) in the coding sequence of the 1307/01 Mbl749c 
homologue leads to the non-conservative substitution of 
Glyl9 to Asp, and this nSNP appears to be linked to the 
upregulation of both Mbl749c and Mbl750c in that 
strain. In order to confirm this, a 0.9 kb DNA fragment 
containing the Mbl749c-Mbl7S0c-'MB17Slc region of 
1307/01 (containing the 'T' allele) and the equivalent 
region from 2122/97 (with the 'C allele) were PGR 
amplified and the fragments were cloned separately 
into the mycobacterial shuttle vector pKINT (see Methods) 
to create the constructs pPG107 and pPG106, respectively. 
The two constructs were introduced into Mycobacterium 
smegmatis mc^l55, separately, and then the expression of 
Mbl749c and Mbl750c in M. smegmatis pPGlOl was 
compared to that of M, smegmatis pPG102 using real time 
RT-PGR. Table 4 shows that the expression levels of 
Mb 1750c and Mb 1749c in the strain expressing the 
mutated forms of Mbl749c/Mbl750c are 13- and 9-fold 
higher, respectively, compared to the strain expressing 
the wild type forms, confirming that this SNP is respon- 
sible for the observed up-regulation of the two genes in 
1307/01. 

Synonymous substitutions do not lead to changes in 
protein sequence and have generally been considered to 
be silent' or benign. Recent studies, however, have sug- 
gested that sSNPs can have functional effects, such as 
decreased mRNA stability and translation [21,22]. In 
our own studies, we have found several genes whose ex- 
pression levels correlate with the presence of sSNPs in 
the coding regions of those genes. For example, a G-T 
transversion at position 303227 (wrt 2122/97) within 
the coding sequence of nirB of 1307/01 is a sSNP that 
appears to be linked with the upregulation in expression 
of nirB within that strain. To confirm that this is the 
case, we PGR amplified 3.5 kb DNA fragments containing 



Table 4 Confirmation of SNP linkage to upregulation in 
gene expression 



Strain 


Gene 


Fold change* 


M. smegmatis mc^l55 (pPGlOl) 


Mbl750c 


16.1 ± 3.2 


M. smegmatis mc^l55 (pPGlOl) 


Mbl749c 


12.5 ± 1.2 


M. smegmatis mc^l55 (pPG102) 


Mbl750c 


1 .9 ± 0.8 


M. smegmatis mc^l55 (pPG102) 


Mbl749c 


1 .4 ± 0.4 


Mb 2122/97 (pPGlOS) 


nirB 


2.0 ± 1 .0 


Mb 2122/97 (pPGlOS) 


nirD 


1 .3 ± 0.5 


Mb 2122/97 (pPG109) 


nirB 


52.9 ± 15.9 


Mb 2122/97 (pPG109) 


nirD 


3.1 ± 0.8 



*Fold changes are the mean ratios ± standard deviation. 



the hsp-nirB-nirD-cobU region of strain 1307/01 (containing 
the 'C allele) and the equivalent region from strain 2122/97 
(with the 'T' allele) and cloned them separately into the 
integrating vector pKINT to create the constructs 
pPGlOS and pPG109, respectively. These constructs 
were introduced into 2122/97 and the expression levels 
of nirB and nirD were found to be 30- and 2-fold, re- 
spectively, higher in the strain expressing the mutated 
nirBD locus compared to the strain expressing the wild- 
type form (Table 4). This confirms that this mutation is 
responsible for the upregulation of the two genes in this 
strain. 

Use of a high density tiled oligonucleotide microarray to 
detect differentially expressed small RNA transcripts in 
M. bovis isolates 

The M, tuberculosis I M, bovis amplicon arrays used in 
the present study were specifically designed to measure 
expression levels of genes annotated in the genomic se- 
quence of M, bovis 2122/97 [10]. They were not, how- 
ever, designed to monitor the expression of non-coding 
RNA such as small RNA within intergenic regions or 
antisense sRNA. Hence, a high density tiled oligonucleotide 
microarray consisting of approximately 180,000 partially- 
overlapping (10-base overlap) short 60 mer oligonucleo- 
tides was designed that offered an unbiased approach to 
the detection of strand specific transcripts encoded over 
the entire M. bovis 2122/97 chromosome. Total RNA 
that includes small sized (<100 nt) RNA species was 
extracted from the four M, bovis strains that had been 
grown in liquid media and hybridised to the oligo- 
nucleotide microarray. To avoid potential secondary 
strand synthesis during cDNA synthesis, which could 
be interpreted as sRNA, the RNA was directly labelled 
with cyanine based dyes. After pairwise comparisons 
were performed between 2122/97 and 1121/01, 2451/01 
or 1307/01, 220 oligonucleotide probes were identified 
that detected differentially expressed transcripts (2.5 fold 
cut off) in one or more of the three strains (Additional 
file 4). Only transcripts detected by multiple (2 or more) 
overlapping probes were regarded as genuine transcripts 
as those detected by single probes could be due to cross- 
hybridisation effects or represent spurious transcripts. 
Using these criteria, 26 transcripts, designated T1-T26, 
were found to show differential expression in one or more 
of the strains (Table 5), and those transcripts can be di- 
vided into those that are encoded within intergenic re- 
gions and those encoded within the genomic co-ordinates 
encompassing annotated coding sequences. Comparison 
of the differentially expressed gene lists identified using 
amplicon vs. oligonucleotide arrays (Tables 3 and 5), it 
is clear that many of the transcripts detected using the 
amplicon arrays are not necessarily encoded on the 
sense gene strand, as had been previous interpreted. For 



Table 5 Differential expression of RNA transcripts as detected by a tiled oligonucleotide microarray 



Transcript No. Probes Position Size cds Strand* 1121/01 2451/01 1307/01 CDS Product SNP 



T1 


3 


303138-303295 


157 


Mb0258/nirB 


A 






3t 


nitrite reductase (large sub-unit) 




T2 


5 


303260-303515 


255 


Mb0258/nirB 


S 






5t 


nitrite reductase (large sub-unit) 


sSNP at 303227; C-T in 1307/01 only 


T3 


2 


1105085-1105193 


108 


Mbl014 


A 




3T 


3T 


probable adhesion component 
of ABC transporter 




T4 


2 


1105158-1105315 


157 


Mbl014 


A 




3t 


3T 


probable adhesion 
component of ABC transporter 




T5 


3 


1 105158-1 105315 


157 


Mbl014 


S 




3t 


3t 


probable adhesion 
component of ABC transporter 




T6 


5 


1778876-1779131 


255 


Mbl618c 


A 








possible secreted lipase 


sSNP at 1778879; C-T in 1 121/01 only 


T7 


2 


1840174-1840282 


108 


Mbl672c 


S 






4i 


hypothetical protein 




T8 


3 


1932158-1932315 


157 


tRNA-Pro 


S 






3t 


proline tRNA 




T9 


5 


1932452-1932707 


255 


Mbl749c 


s 






4t 

^ 1 


toxin component of 
toxin-antitoxin system 


nSNP at 1932704; C-T (G to D) in 1307/01 only 


T10 


5 


1932746-1933075 


329 


Mbl750c 


S 






7t 


anti-toxin component of 
toxin-antitoxin system 




Til 


6 


2122959-2123263 


304 


Mbl914c 


A 




12t 




probable short-chain type 
dehydrogenase/reductase 


sSNP at 2122970; C-T in 2451/01 only 


T12 


2 


2324463-2324571 


108 


Mb2110 


S 






3t 


hypothetical protein 
(frame-shifted in 2122/97) 




T13 


2 


2328944-2329052 


108 


Mb2116c/pepE 


S 




2i 


3i 


probable dipeptidase 




T14 


2 


2329483-2329591 


108 


Mb2117 


A 




lOi 




5'-3' exonuclease 


nSNP at 2329583; A-G (comp. strand; 1 to T) 
in 2451/01 only 


T15 


3 


2868310-2868467 


157 


Mb2607 


A 




6i 


81 


hp (frame-shifted in 2122/97) 




T16 


2 


2868506-2868614 


108 


Mb2607 


A 




71 


71 


hp (frame-shifted in 2122/97) 


nSNP at 2868616; A-G (stop to W) in 
2451/01 and 1307/01; 


T17 


2 


3075740-3075848 


108 




1 




5t 


6t 






T18 


2 


3075887-3075995 


108 








5t 


6t 






T19 


2 


3076622-3076730 


108 








4t 


4t 






T20 


2 


3076769-3076877 


108 




1 




5t 


5t 






T21 


2 


3078728-3078836 


108 




1 




6t 


7t 






T22 


2 


3079169-3079277 


108 








4t 


6t 






T23 


2 


3079903-308001 1 


108 








6t 


7 






T24 


2 


3844577-3844685 


108 


Mb3509c 


S 




3t 


3t 


possible acyltransferase 
(frame-shifted in 2122/97) 




T25 


6 


4155501-4155805 


304 


Mb3803/echA21 


A 




5t 




possible enoyl-coA hydratase 


sSNP at 4155803; C-T (comp. strand) in 2451/01 only 


T26 


2 


4316987-4317095 


108 


Mb3927c 


A 




3T 




chp (frame-shifted in H37Rv) 





Up and down arrows indicate fold up- and downregulation, respectively, and empty cells Indicate no change In expression. 
^Strand, A Indicates antlsense, S, sense and I, Intergenlc. 
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example, the amplicon array data had appeared to suggest 
that Mbl914c and echA21 were upregulated in 2451/01, 
but the oligo array data indicates that transcripts 11 and 
25, which are encoded within the co-ordinates encoded by 
those two genes, are actually encoded on the antisense 
strands. This apparent discrepancy can be rationalised 
once we consider that double stranded amplicon micro- 
array probes cannot discriminate between transcripts 
encoded on the sense or antisense strands. Transcripts 11 
and 25 can therefore be considered as potential antisense 
sRNAs (asRNA), which could be involved in translational 
or post-transcriptional control of the sense transcript. 
Other potential cis-encoded sRNAs detected using the ar- 
rays include T6, T14, and T15/T16 which are encoded on 
the antisense strands to Mbl618c, Mb2117 and Mb2607, 
respectively, and for each of these transcripts, their expres- 
sion appears to be linked to the presence of a single SNP 
within the co-ordinates of the genes. The approximate 
boundaries of these transcripts can be derived from the 
genomic co-ordinates of the oligonucleotide probes that 
detect the expression of the transcript. Thus, the tran- 
scripts appear to be between 100-300 nt in size and the 
positions of the linked SNPs appear to be positioned either 
just upstream or within the predicted 5 'end of the tran- 
scripts (Figure 3). Three of the transcripts (Til, T14 and 
T25) are antisense to the central part of the sense encoded 
gene, while T6 is encoded antisense to the 5' end of 
Mbl618c. As well as antisense transcripts, we also saw the 
differential expression of sense transcripts. The amplicon 
microarray data (confirmed by real time RT-PCR) indi- 
cated that nirB is strongly upregulated specifically in 
1307/01 in both in vitro and ex-vivo M(|). An analysis of 
the oligonucleotide array data, however, indicates that 
there are two short transcripts, Tl and T2 (sense and anti- 
sense, respectively) that are encoded within the genomic 
co-ordinates of the nirB gene. T2 is the longer in size (255 
vs. 155nt) and more highly expressed (5 vs. 3-fold) than 
Tl, and both transcripts appear to be linked to the pres- 
ence of a SNP that is located within the middle of Tl and 
approximately 50nt upstream of T2. 

Some of the transcripts are bone fide gene sense 
strand mRNA transcripts, such as T9 and TIO which 
are encoded by Mbl749c and Mbl750c, respectively. 
Although it would appear that the two genes are tran- 
scribed separately, it is probable that the two transcripts 
are co-transcribed as the stop codon of Mbl750c over- 
laps the start codon of Mb 1749c. Eight of the tran- 
scripts listed in Table 5 are encoded within intergenic 
regions, 7 of which are encoded within the polymorphic 
direct repeat (DR) locus. The DR locus of strains belong- 
ing to the M, tuberculosis complex has been suggested to 
constitute a CRISPR locus which have been shown in 
many species of bacteria to be involved in protection 
against exogenous foreign DNA such as plasmids and 



phage [25]. All the DR encoded transcripts are short 
(approx. 100 nt), straddle contiguous repeat and spacer 
sequences and show approximately 5-fold higher levels 
of expression in 2451/01 and 1307/01 compared with 
2122/97 and 1121/01. 

Characterisation of differentially expressed cis asRNA 

The genomic co-ordinates of the oligonucleotide probes 
that detected the antisense species described above can 
only serve as approximate estimations as to their start 
and end points. Thus, we used 5' RLM-RACE (RNA 
Ligase Mediated Rapid Amplification of cDNA Ends) in 
an attempt to accurately define the transcriptional start 
sites (TSS) for the short sense transcript T2, and the anti- 
sense transcripts T6, Til and T25 described in the above 
section (see Methods). These transcripts were chosen as 
their expression levels are high and their transcript lengths 
were considered to be sufficiently long to enable the 
RLM-RACE methodology to work. Table 6 details the 
sizes of the PGR products obtained after RLM-RACE was 
performed using oligonucleotide primers designed to 
sequences predicted for transcripts T6, Til and T25. No 
PGR product was obtained for transcript T2. For each of 
the three transcripts, the TSS was determined to be a G 
residue, which is the most commonly used residue type 
for mycobacterial TSSs (Figure 4) [26]. For each of the T6, 
Til and T25 transcripts, expression of the asRNAs was 
linked to the presence of a SNP (G to T) proximal to 
the 5 ' end of the asRNA. Strains exhibiting the 'C allele 
showed no expression of the asRNA, whilst the strain 
that showed expression had the 'T' allele. An analysis of 
the nucleotide sequence in the vicinity of the SNPs 
reveals that for each of the three transcripts the SNP 
constitutes the 6th residue of a motif that has strong 
homology to the consensus sequence for the -10 element 
of Group A mycobacterial promoters (Figure 4) [26]. The 
finding that a 'T' residue is associated with expression is 
consistent with the consensus sequence which indicates 
that 86% of all -10 elements have a 'T' residue at the 6th 
residue position. Several residues that flank the -10 motif 
also show a degree of conservation. Sequence motifs 
which show strong homology to group A -35 elements 
are present 18-19 bp upstream of the putative -10 
elements, and the distances between the -35, -10 and 
TSS elements are consistent with those elements of 
the consensus sequence. No protein encoding open 
reading frames were detected within the T6, Til and 
T25 transcripts. 

In a parallel study, high density oligonucleotide micro- 
arrays were also used to interrogate the transcriptomes 
of M, tuberculosis H37Rv, M, bovis BGG Pasteur, Myco- 
bacterium caprae and M, bovis AN5 that had been 
grown in Middlebrook 7H9 media. As a result of these 
experiments, two asRNA species were found to be 
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Figure 3 Expressions and scliematic representation of genomic locations of selected cis-encoded antisense sRNAs identified using a 
tiled oligonucleotide microarray. Three asRNAs (open arrows) are (a) T6, (b) T14 and (c) T25. For each asRNA, a histogram plots the fold 
changes for each of the oligonucleotide probes that detected the asRNA, and for each probe the binding position relative to the 2122/97 
genome is indicated. Closed and open arrows indicate lengths and direction of transcription of genes and asRNAs, respectively. 



expressed within the antisense strands of the inol and 
narH genes of M, tuberculosis H37Rv, but not in any of 
the other 3 strains tested (data not shown). A compari- 
son of nucleotide sequences of the orthologous genes 
across the species suggested that expressions of the 
as_sRNAs correlated with the presence of a sSNP (C to T 



transition at positions 50555 and 1292100 wrt H37Rv 
genomic sequence for as_inol and as_narH, respectively) 
upstream of the asRNAs. Approximate information re- 
garding the transcriptional start site was deduced from the 
binding co-ordinates of the probes that detected the tran- 
scripts. As with the M, bovis antisense sRNAs described 
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Table 6 Determination of transcriptional start sites of 
antisense RNAs 



Transcript 


asRNA 


PCR 


TSS 


TSS residue postn 






prod, size 


residue type 


(wrt 2122/97) 


T6 


nnbl618c_as 


150 


G 


1 778886 


Til 


nnbl914c_as 


220 


G 


2122978 


T25 


echA21_as 


175 


G 


4155796 



above, the T residue associated with the expression of 
the M, tuberculosis asRNAs is part of a putative -10 
element. A -35 element with an appropriate spacing to 
the -10 element was identified for asjnol, but not for 
as_narH, suggesting that the as_narH promoter may 
belong to group B mycobacterial promoters that have 
a conserved -10, but no -35 motif [26]. 

Discussions 

The aim of this work was to define possible phenotypic 
variation across M, bovis field isolates through a combin- 
ation of genome sequencing, comparative genomics and 
transcriptome analyses from both in vitro and ex vivo con- 
ditions. Using these approaches we uncovered a range of 
novel findings, the most striking of which was the realisa- 
tion that genes that had been predicted to be differentially 
expressed based on amplicon-microarray data were in fact 



not upregulated, and that instead it was an antisense tran- 
script that was showing differential expression. Analysing 
both transcriptome and genome sequence data allowed us 
to identify SNPs responsible for the transcription of anti- 
sense RNAs, with generation of a consensus -10 promoter 
sequence the likely mechanism. Our results suggest that 
data generated from amplicon arrays in the past may need 
to be revisited, as it is possible that some coding- 
sequences identified as being differentially expressed were 
instead antisense transcripts. 

With the growth of technologies such as high density 
tiled oligonucleotide microarrays and next generation 
sequencing there has been a rapid increase in the num- 
ber of reports describing the existence of non-coding 
RNAs (ncRNAs) in bacteria. Non-coding RNAs broadly 
consist of two types, cis- and trans-encoded RNA. Trans 
RNA includes intergenic encoded RNA, while cis-encoded 
RNA includes 5' and 3' untranslated regions of mRNA 
and antisense RNA. To study the expressions of both cis- 
and trans encoded ncRNA we used a high density oligo- 
nucleotide tiled microarray since our amplicon microarray 
was unable to detect intergenic transcripts or differentiate 
between sense and antisense transcripts. Previous studies 
using M, tuberculosis have identified substantial amounts 
of ncRNA encoded in both intergenic and intragenic re- 
gions [7,8]. We detected substantial amounts of ncRNAs 



5' 18 bp 6 bp 3' 

I 1 I 1 

aS_mb1 61 8c GGGGmSCCGTACATGTTCGTGGTCCGGiaCAGrAGCTGGGTAGCGGTGACGGGCTGCGGA... . . 



5' 18 bp 7 bp 3' 
I 1 ( 1 

aS_mb1 914c CCCCriXSACGGCGTGTTTGGTCGCCC-AGMGAcTbcGATAcGcGGCATGCCATAGGTGCCC 

19 bp 7 bp 3» 

I 1 ( 1 

aS_eChA21 GTTGTrggrCGAGAACGTCCTTGATGCC GTAGACrGCC AAAGGTGGGTTGGCGGCGATCTCG 



16-21 bp 6-7 bp 

-35 -10 



Concensus seq T.,t^G.A.c ,a..' ' T,AiK*.R„a,:T,l g, 



17 bp 
I 1 

inOl -Mtb GGCTCrrGArgrCATCACCGACGATGGGTACCCrGGCGTCGGTGAACTTCTTGGCCCACA 



narH - Mtb _..CAGCGGCGGGATGTACCAGACCATCCGCATGGTCCG GrACTCrGGA TGTAGCGGCAGCGCCACCCGGTA.... . 

Figure 4 Promoters of anti sense RNAs. a. Promoters of the asRNAs as_mbl618c, as_1914c and as_echA21. -10 and -35 elements are 
indicated in bold and italics. Transcriptional start sites are indicated by large font G characters, while SNP residue that leads to the expression of 
the asRNA is indicated by a large font red T residue. The consensus sequence for group a mycobacterial promoters is indicated. Numerical 
subscripts indicate the percentage of the total number of promoters for which a transcriptional start site has been experimentally determined 
that show the indicated residue, b. Promoters of the differentially expressed asRNAs asjnol and as_narH in M. tuberculosis. -10 and -35 elements 
are indicated in bold italics. The red residue indicates SNP responsible for differential expression. 
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in M. bovis, including many instances of cis-antisense 
RNA species. Due to their perfect complementarity, cis 
asRNA form a duplex with the sense strand encoded 
transcript resulting in either degradation [27] or transla- 
tion inhibition [28] of the sense mRNA. Antisense 
RNAs vary in length, ranging from 10s to 1000 's of 
nucleotides and can be classified according to their 
encoded position with respect to the opposite sense 
encoded gene. Thus, they can be classified as 5' or 3' 
overlapping, while others are classified as internally 
located. The 1121/01 specific as_Mbl618c is an example 
of a 5 ' overlapping asRNA, which is encoded antisense to 
gene Mb 16 18c which is predicted to express a secretory 
lipase. The location of the asRNA transcript suggests it 
may function to prevent translation of Mbl618c mRNA 
by steric hindrance of the ribosome binding site. 

In the work presented here strain 2451/01 expressed 
two asRNAs, as_Mbl914c and as_echA21, that are not 
expressed by any of the other 3 strains. They are 
encoded within the central part of the opposite genes 
and are therefore likely to modulate the stability of the 
transcripts. Mb 19 14c encodes a short chain dehydro- 
genase while echA21 encodes an enoyl-CoA hydratase. 
Short chain dehydrogenases catalyse a wide range of 
functions so the precise function and identity of the 
substrate is difficult to deduce from sequence alone. 
Enoyl-CoA hydratases hydrate double carbon-carbon 
bonds of macromolecules and are vital in the metabol- 
ism of fatty acids. Both gene products would therefore 
appear to be involved in the metabolism of a macromol- 
ecule and their similar expression profiles in this strain 
could indicate involvement in the metabolism of the same 
molecule, or molecules that are of the same pathway. 

In many instances, upregulation of asRNA negatively 
correlates with the transcription of the antisense gene 
[27], but in many cases expression of the antisense tran- 
script has no effect on the transcription of the opposite 
gene. In our studies, expressions of as_Mbl618c, 
as_Mbl914c and as_echA21 did not appear to have any 
effect on the expressions of the opposite sense encoded 
genes (data not shown). We have shown that the 
expressions of the asRNAs are associated with the pres- 
ence of SNPs, which are either synonymous or non- 
synonymous with respect to the sense transcript, but 
upstream of the asRNA transcriptional start site. This 
highlights the fact that mutations can potentially affect 
expression of transcripts on both strands, and that the 
classification of a SNP is strand dependent. For each of 
the three asRNAs, the associated SNP was found to be 
located within a putative -10 promoter motif of group 
A mycobacterial promoters. The sixth residue of the -10 
hexamer motif consensus sequence is a strongly conserved 
'T' residue, which is present in 81% of all group A myco- 
bacterial promoter elements. Its importance is underlined 



by the finding that the strains that exhibit a 'C residue at 
this position show no detectable expression of the asRNA, 
while strains having a 'T' residue at this position exhibit 
expression. 

Single nucleotide polymorphisms were found to be 
the most frequent form of genetic variation that exists 
between the isolates, with a total of 1013 SNPs detected 
across the three strains 1121/01, 2451/01 and 1307/01 
compared to the reference strain M. bovis 2122/97. 
Non-synonymous SNPs, which include both non-sense 
and missense SNPs, are a class of SNPs most likely to 
impact on protein function and contribute to phenotypic 
variation. Non-sense SNPs, which results in the expression 
of a truncated polypeptide due to the introduction of a 
premature stop codon, were identified in five genes across 
the strains. Of these, we focussed our attentions on the 
non-sense SNP present in a gene encoding the LysR regu- 
lator, Mb2007c as mutations affecting regulators are likely 
to impact on the expression of one or more genes that are 
part of the regulon of the regulator and are therefore more 
likely to result in phenotypic variation. Experiments to 
compare the transcriptomes of a strain that exhibited the 
mutation with a strain overexpressing a functional regula- 
tor did not, however, reveal any differences. The reason 
for this unclear, but could reflect a requirement of the 
regulator for a co-inducer that was absent under the con- 
ditions of the experiment. The consequences of missense 
SNPs are more difficult to predict, as substitutions of one 
amino acid for another in a protein sequence do not ne- 
cessarily lead to a change in protein function. However, 
for genes that are controlled by an autoregulatory mech- 
anism, a mutation that affects the ability of the product of 
the gene to regulate itself will result in a change in expres- 
sion of the gene. In our studies, we have shown that the 
presence of a missense mutation in a VapB type toxin en- 
coding gene Mb 1749c in strain 1307/01 results in the 
upregulation in expression of the toxin- antitoxin encoding 
pair of genes Mbl750c-Mbl749c due to the inability of 
the encoded proteins to self regulate themselves. Toxin- 
antitoxin systems have a variety of proposed cellular func- 
tions including general regulation of mRNA stability levels 
in the cell [29]. Further experiments are required to fully 
understand the consequences of this mutation. 

Genomic deletions have played an important role in 
the evolution of strains belonging to the mycobacterial 
complex [30], and in the derivation of the tuberculosis 
vaccine strain M. bovis BCG [31]. In addition to the previ- 
ously described 6.8 kb gene deletion that is specific to 
strains having a spoligotype 17 pattern [6], we have identi- 
fied a 1.6 kb multi-gene deletion that is specific to strain 
1307/01 and encompasses genes that are part of the DosR 
regulon. However, one of the deleted genes exists as a 
pseudo gene in DosR in strain 2122/97, so its importance 
to the biology of M. bovis is unlikely to be significant. 
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Several other genes with internal deletions were detected 
but none of the encoded proteins have any significant 
similarity to any protein with a defined function. 

Conclusions 

In conclusion we have performed a comprehensive ana- 
lysis of 4 M. bovis strains of the most common molecular 
types circulating in GB. We show that while these strains 
show extensive similarities in their genetic make-up and 
gene expression profiles, they show distinct differences in 
the expression of a subset of genes. We provide functional 
data to show that SNPs can lead to the expression of anti- 
sense RNA, a finding with implications for how we define 
a silent' nucleotide change. Furthermore, we show that 
the interpretation of transcriptome data based solely on 
amplicon arrays could lead to artefacts due to expression 
of antisense transcripts, a caveat that needs to be kept in 
mind for previous studies of global expression analysis in 
bacteria. 

Methods 

Bacterial strains, media and growth conditions 

For the M(|) infection experiments, bovine alveolar M(|) 
were cultivated in tissue culture media RIO, which 
consisted of RPMI (Invitrogen) media plus 2 mM glu- 
tamine, 10% calf fetal serum and 1% amphotericin. 
Where used, antibiotics gentamycin and ampicillin 
were added at concentrations of 50 and 100 (ig / ml, 
respectively. M, bovis field strains were pre-grown in 
Middlebrook 7H9 broth supplemented with 10% albumin- 
dextrose catalase (ADC, Difco), 0.05% Tween and 10 mM 
pyruvate. Cultures were harvested in mid-logarithmic 
phase (ODeoo of 0.3-0.8), washed and then resuspended in 
RPMI containing 0.05% Tween 80. 

Isolation of bovine alveolar macrophages and infection 
with mycobacteria 

The lungs of a 6-8 week old male Holstein-Friesian calf 
were removed and a whole lung lavage procedure was 
performed to washout the alveolar M(|). Briefly, 4-5 x 
500 ml aliquots of Hanks' Balanced Sterile Salts solution 
(HBSS) were used to infuse the lungs via the trachea, 
and the washings were pooled in a sterile beaker. The M(|) 
cells contained in the washes were pelleted by centrifuga- 
tion at 500 x g for 10 mins at 4°C, washed and then 
resuspended in RIO growth media supplemented with 
antibiotics (R10+) to a concentration of 1-2 x 10^ / ml. 
Approximately 0.5-1.5 x 10^ M(j) were isolated per calf 
lung. 

Vented 225 cm^ tissue culture flasks containing R10+ 
media were seeded with 3-4 x 10^ alveolar M(|) and 
placed in a humidified 37°C incubator containing 5% 
CO2. Typically, 2-4 flasks were used per strain and time 
point. After 24 hrs, the growth media was decanted to 



remove non-adherent cells and then replaced with fresh 
R10+ media. After a further 24 hrs, the growth media 
was discarded and the M(|) monolayer was washed with 
RPMI to remove traces of the antibiotic containing 
growth media. The monolayer was then covered with 
RIO media without antibiotics (R10-) and then infected 
with mid-logarithmic phase grown mycobacteria using 
an MOI of approximately 10:1 (bacilli: M(|)). The AlvM(|) 
were incubated with mycobacteria for 4 hrs, after which 
the cell monolayers were washed with RPMI and then 
either processed for RNA extraction (4 hr time point) 
or incubated in fresh R10+ media for a further 20 hrs 
before being processed for RNA extraction (24 hr time 
point). 

Extraction and amplification of mycobacterial RNA from 
infected macrophages 

M(|) cell monolayers were lysed using a guanidinium 
thiocyanate (GTC) containing solution. The lysed M(|)s 
were vortexed and passed twice through a 21G blunt 
ended needle to sheer host genomic DNA and thereby 
reduce the viscosity of the solution. Mycobacterial cells 
were then pelleted by centrifugation at 4600 rpm for 20 
mins at room temperature and washed with GTC solu- 
tion to remove host genomic DNA. Cells were then 
resuspended in Trizol and RNA was extracted using the 
protocol outlined in Bacon et al. [32]. The amount of 
purified DNase-treated RNA recovered was of the order 
100-500 ng per time point. RNA was amplified using 
the 'MessageAmp II-Bacteria RNA Amplification Kit' 
(Ambion) according to the manufacturers' instructions. 
Using an input of 100 ng of unamplified RNA, 20-100 
(ig of amplified RNA was recovered. 

Amplicon microarray analysis 

For the in vitro growth experiments, three independent 
experiments (biological replicates) were carried out, and 
for each strain in each experiment two microarrays (tech- 
nical replicates) were performed. Thus, for each strain 6 
microarrays were performed. Three independent AlvM(|) 
infection experiments were carried out and for each 
experiment two microarrays were performed for each of 
the control RPMI samples, and the 4- and 24 hrs post- 
infection samples. Cy5 and Cy3 fluorescently-labelled 
probes were synthesised from RNA and genomic 
DNA, respectively, and hybridised to whole genome 
M, bovis I M, tuberculosis microarrays. The array design 
is available in B(iG@Sbase (accession number A-BUGS-31; 
http://bugs.sgul.ac.uk/A-BUGS-31) and also ArrayExpress 
(accession number A-BUGS-31). Details of probe synthesis, 
hybridization conditions and manufacture of the micro- 
array can be found in Golby et al. 2008. Microarrays 
were scanned using an Affymetrix 428 Microarray 
scanner and scanned images were quantified using 
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BlueFuse for Microarrays v3.2 software (BlueGnome). 
See Golby et al. 2008 for further details. 

Normalisation was performed by dividing the log ratio 
of the Cy5 to Cy3 signal for every spot by the median of 
the log ratios for all spots, except control spots. A median 
absolute (MAD) scale transformation was applied to the 
normalised data from the pas an additional normalisation 
step. For every microarray, duplicate spots were averaged, 
and then the average expression of every gene across all 
technical replicate microarrays was calculated. Averages 
of the three biological replicates were used to compare 
gene expression between strains. For each gene, a mod- 
erated t-test was applied and those genes with a P- value 
less than 0.05 were selected. From this gene list, those 
genes whose average expression differed by more than 
2.5-fold between strains were selected. Fully annotated 
microarray data have been deposited in B(iG@Sbase (acces- 
sion number: E-BUGS-150; http://bugs.sgul.ac.uk/E-BUGS- 
150) and also ArrayExpress (accession number: E-BUGS- 
150). 

Oligonucleotide microarray analysis 

Experiments were performed in a similar way to that 
described for the in vitro amplicon array experiments, 
except that the RNA was purified using the mirVana 
miRNA Isolation kit (Ambion), which is designed to 
capture small (>20 nt) RNA species. RNA and genomic 
DNA were directly labelled with Cy5- and Cy3, respect- 
ively, using the ULS microRNA labelling kit (Kreatech), 
according to the manufacturers instructions. Purified 
Cy5- and Cy3-labelled probes were co-purified and 
appUed to an Agilent 40K custom made tiled (10 nt 
overlap) 60-mer oligonucleotide microarray designed to 
the genomic sequence of M. bovis 2122197 191 , The array 
design is available in B(iG@Sbase (accession number 
A-BUGS-52; http://bugs.sgul.ac.uk/A-BUGS-52) and also 
ArrayExpress (accession number A-BUGS-52). Microarrays 
were hybridised at 65°C for 18 hrs and then washed in 
Wash Buffer 1 (Agilent) at room temperature for 1 minute. 
The slides were then washed in Wash Buffer 2 (Agilent) at 
37°C for 1 minute, dried and then scanned at 2 (im using 
an Agilent DNA microarray scanner. 

Tiling array data was analysed using the Limma package 
of R/Bioconductor [33]. The signal median was quan- 
tile normalised between arrays followed by a LOESS 
normalisation within arrays. Differential expression 
analysis was performed by pairwise comparison using 
linear models and empirical Bayes methods, and P 
values adjusted using the Benjamini and Hochberg's 
method to control for multiple testing. Fully annotated 
microarray data have been deposited in B(iG@Sbase 
(accession number: E-BUGS-150; http://bugs.sgul.ac. 
uk/E-BUGS-150) and also ArrayExpress (accession num- 
ber: E-BUGS-150). 



Whole genome sequencing of M. bovis field isolates 

Whole genome paired end (2x 76bp) sequencing was 
performed using lUumina HiSeq machines at the 
Wellcome Trust Sanger Institute (Hinxton, Cambridge). 
Raw sequence data was uploaded to the European Nucleo- 
tide Archive (ENA) and can be downloaded at http://www. 
ebi.ac.uk/ena/, accession numbers: ERX006616, ERX06617, 
ERX012284 and ERX012286. The FastQC (http://www. 
bioinformatics.babraham.ac.uk/proj ects/fastqc) program 
was used to analyse the quality of the raw data reads. Raw 
sequence data was trimmed to remove adapter sequences 
and nucleotides where the sequence quality score was 
below 20. Filtered reads were aligned to the M, bovis refer- 
ence strain 2122/97 [10] with the SMALT alignment 
program (http://www.sanger.ac.uk/resources/software/smalt) 
using default settings. The average mean coverages for the 
sequenced strains 1121/01, 2451/01 and 1307/01 were 
293, 224 and 234, respectively. Reads that did not map 
onto the reference genome were de novo assembled and 
blasted against the NCBI data base in order to find regions 
present in these strains but absent in 2122/01. Variant 
calling and the generation of consensus sequences 
were carried out using the SAMtools program suite 
(http://samtools.sourceforge.net). SNPs that had a mini- 
mum of quality score of 200 and had a minimum of four 
good quality forward (reverse) reads covering the SNP site 
and having the variant base were retained. The number of 
forward (reverse) reads mapping with good quality onto 
the SNP site having the same base as in the reference had 
to be less than 5% of the total number of forward (reverse) 
reads mapping with good quality onto the SNP site having 
the variant base. Sequences of 12 bases in length centred 
around SNP sites were blasted to the genome of the refer- 
ence strain 2122/97, and any SNP within an area with a 
90% hit to another area of the genome were filtered out. 
This eliminated most of the SNPs within the repetitive 
PE-PGRS regions. For all sequenced strains, more than 
99.9% of the genomes were covered by reads. 

Real time RT-PCR 

Quantitative real-time SYBR Green based PGR (qRT-PCR) 
experiments were performed using a RotorGene 3000 
(Corbett research) as described by Golby et al. [23]. Fold 
changes were calculated using relative standard curve 
method and per controls included no template and no 
reverse transcriptase. Primer pair sequences are given in 
Additional file 5, available with the online version of 
this paper. 

Construction of the Rvl 749c-Rv1 750c and nirBD 
overexpressing plasmids 

The Rvl749c-Rvl750c overexpressing plasmids pPG106 
and pPG107 were constructed by PGR amplification of a 
907bp fragment encompassing Rvl 749c-Rvl7S0c-'Rv 1751c 
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using primers tox_f and tox_r. For pPG106, the fragment 
was amplified using 2122/97/97 genomic DNA as a tem- 
plate, while pPG107 was amplified using 1307/01 gDNA. 
Both fragments were digested with Spel and cloned into 
the Spel cut mycobacterial attP-integrating shuttle vector 
pKINT (a gift from Douglas Young, Imperial College, 
London). Plasmids pPGlOS and pPG109, which contain a 
3.5 kb hsp'-nirB-nirD-'cobU fragment was constructed in 
several steps. Firstly, two 1.6 kb hsp'-nirB' PGR fragments 
were PGR amplified separately using primers nirBl_f and 
nirBl_r and 2122/97 and 1307/01 as genomic templates. 
Similarly, two 1.8kb 'nirB-nirD-'cobU fragments were 
amplified using primers nirB2_f and nirB2_r and genomic 
DNAs 2122/97 and 1307/01. The two PGR products were 
digested with Spel and BamHI and then co-ligated into 
pKINT. Details concerning the nucleotide sequences of 
the per primer pairs are given in the Additional file 5. 

5'-RLM-RACE PGR 

Transcriptional start site mapping of was determined 
using the First Ghoice RNA ligase-mediated rapid amp- 
lification of cDNA ends (RLM-RAGE) kit (Ambion) as 
per manufacturers instructions. Briefly, 10 ug of total 
RNA was treated with calf intestinal phosphatase (GIP) 
and tobacco acid pyrophosphatase (TAP) before ligation 
of an RNA Adapter oligonucleotide to the 5' ends of 
the mRNA transcripts. A random-primed reverse tran- 
scription reaction was carried out to generate cDNA 
and then nested PGR reactions were performed on the 
cDNA using combinations of adapter and gene specific 
primers. Details concerning the nucleotide sequences 
of the 5 ' outer and inner adapter sequences as well as 
3' outer and inner gene specific primers are given in 
the Additional file 5. PGR products generated using 
the 5 ' inner adapter and 3 ' inner gene specific primers 
were sequenced by Sanger sequencing using the 3' inner 
gene specific primer. 

Animal ethics statement 

Animal work was carried out according to the UK Animal 
(Scientific Procedures) Act 1986. The study protocol was 
approved by the AHVLA Animal Use Ethics Gommittee 
(UK Home Office PGD number 70/6905). 

Availability of supporting data section 

Fully annotated microarray data have been deposited in 
B(iG@Sbase (accession number: E-BUGS-150; http:// 
bugs.sgul.ac.uk/E-BUGS-150) and also ArrayExpress (ac- 
cession number: E-BUGS-150). Raw sequence data was 
uploaded to the European Nucleotide Archive (ENA) and 
can be downloaded at http://www.ebi.ac.uk/ena/data/view/ 
ERX006616-ERX06617,ERX012284-ERX012286. 
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Additional file 1: SNPs identified across sequenced strains. 

Additional file 2: Large deletions identified by NGS in sequenced 
strains. 

Additional file 3: Fold change differences in gene expression in 
M. bovis field isolates 1 121, 2451 and 1307 compared to 2122. 

Cells shaded in red indicate upregulation, green indicate down- 
regulation and empty cells indicate no change in expression. 

Additional file 4: Differential expression of RNA transcripts as 
detected by a tiled oligonucleotide microarray. Up and down arrows 
indicate fold up- and downregulation, respectively, and empty cells 
indicate no change in expression. *Strand, A indicates antisense, S, sense 
and I, intergenic. 

Additional file 5: Details of oligonucleotides used in PGR, RT-PCR 
and RLM-RACE experiments. 
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